ggplot2 Tutorial

Cedric Scherer (scherer@izw-berlin.de)

Leibniz Institute for Zoo and Wildlife Research (IZW) Berlin

2017-07-06

Introductory Words

The initial version of this tutorial follows a blog entry called Beautiful plotting in R: A ggplot2 cheatsheet by zev@zevross.com, posted on 4. August 2014, updated last in January 2016.

Major changes were made:

Preparation

Load ggplot2

library(ggplot2)

The Dataset

We are using data from the National Morbidity and Mortality Air Pollution Study (NMMAPS). To make the plots manageable we are limiting the data to Chicago and 1997-2000. For more detail on this dataset, consult Roger Peng’s book Statistical Methods in Environmental Epidemiology with R.

chic <- readRDS("chicago-nmmaps.Rds")
str(chic)
## 'data.frame':    1461 obs. of  10 variables:
##  $ city    : chr  "chic" "chic" "chic" "chic" ...
##  $ date    : Date, format: "1997-01-01" "1997-01-02" "1997-01-03" "1997-01-04" ...
##  $ death   : int  137 123 127 146 102 127 116 118 148 121 ...
##  $ temp    : num  36 45 40 51.5 27 17 16 19 26 16 ...
##  $ dewpoint: num  37.5 47.2 38 45.5 11.2 ...
##  $ pm10    : num  13.1 41.9 27 25.1 15.3 ...
##  $ o3      : num  5.66 5.53 6.29 7.54 20.76 ...
##  $ time    : int  3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 ...
##  $ season  : chr  "Winter" "Winter" "Winter" "Winter" ...
##  $ year    : chr  "1997" "1997" "1997" "1997" ...
head(chic, 10)
##      city       date death temp dewpoint      pm10        o3 time season year
## 3654 chic 1997-01-01   137 36.0   37.500 13.052268  5.659256 3654 Winter 1997
## 3655 chic 1997-01-02   123 45.0   47.250 41.948600  5.525417 3655 Winter 1997
## 3656 chic 1997-01-03   127 40.0   38.000 27.041751  6.288548 3656 Winter 1997
## 3657 chic 1997-01-04   146 51.5   45.500 25.072573  7.537758 3657 Winter 1997
## 3658 chic 1997-01-05   102 27.0   11.250 15.343121 20.760798 3658 Winter 1997
## 3659 chic 1997-01-06   127 17.0    5.750  9.364655 14.940874 3659 Winter 1997
## 3660 chic 1997-01-07   116 16.0    7.000 20.228428 11.920985 3660 Winter 1997
## 3661 chic 1997-01-08   118 19.0   17.750 33.134819  8.678477 3661 Winter 1997
## 3662 chic 1997-01-09   148 26.0   24.000 12.118381 13.355892 3662 Winter 1997
## 3663 chic 1997-01-10   121 16.0    5.375 24.761534 10.448264 3663 Winter 1997

A Default ggplot

ggplot2 syntax is different from base R. We always start to define a plotting element by calling ggplot(data, aes(variable1, variable2)) which just tells ggplot2 that we are going to work with that data. Thus, only a panel is created when running this because ggplot2 does not know how we want to plot that data.

(g <- ggplot(chic, aes(date, temp)))

Tipp: By using parentheses while creating an object the object will be printed immediately (instead of writing g <- ggplot(...) and then g).

Let’s tell ggplot which style we want to use:

g + geom_point()

(No worries, we are going to learn several plot types later.)

Change Color of Points

Within this command, you already can insert aesthetics as changing the color of your points:

(g <- g + geom_point(color = "firebrick"))

By applying that to our plotting element, the following plots based on g is going to have red points.

Working with Axes

Add Axis Labels

Let’s add some well-written labels to the axes:

(g <- g + labs(x = "Date", y = expression(paste("Temperature (", degree ~ F, ")"))))

Again, we are updating our plotting element g (which means axes labels will be the same in the plots following afterwards).

Move Labels Away from the Plot & Change Color

theme() is an essential command to modify all kinds of theme elements (texts and titles, boxes, symbols, backgrounds, …). We will use a lot of them – to see what is possible have a look here.

g + theme(axis.title.x = element_text(color = "sienna", size = 15, vjust = -0.35),
          axis.title.y = element_text(color = "orangered", size = 15, vjust = 0.35))

Change Size & Angle of Tick Text

Using angle and vjust you can adjust the position of the text (0 = left-alligned, 0.5 = centered, 1 = right-alligned):

g + theme(axis.text.x = element_text(angle = 50, size = 16, vjust = 0.5))

Remove Axis Ticks & Tick Text

There may be rarely a reason to do so - but this is how it works:

g + theme(axis.ticks.y = element_blank(), axis.text.y = element_blank())

Limit Axis Range

Sometimes you want to zoom into your data. You can do this without subsettting your data:

g + ylim(c(0, 50))

Alternatively you can use g + scale_x_continuous(limits = c(0, 50)) or g + coord_cartesian(xlim = c(0, 50)). The former removes all data points outside the range and second adjusts the visible area.

Force Plot to Start at Origin

Related to that, you can force R to plot the graph starting at the origin:

library(tidyverse)
chic_red <- filter(chic, temp > 25, o3 > 20)

ggplot(chic_red, aes(temp, o3)) + 
  geom_point() + 
  labs(x = expression(paste("Temperature higher than 25 ", degree ~ F, "")), y = "Ozone higher than 20 ppb") +
  expand_limits(x = 0, y = 0)

Using coord_cartesian(xlim = c(0, max(chic_red$temp)), ylim = c(0, max(chic_red$o3))) will lead to the same result.

Axes with Same Scaling

For demonstrating purposes, let’s plot Temperature against Temperature with some random noise:

ggplot(chic, aes(temp, temp + rnorm(nrow(chic), sd = 20))) +
   geom_point() +
   labs(x = "Temperature") +
   xlim(c(0, 150)) + ylim(c(0, 150)) +
   coord_equal()

Use a Function to Alter Labels

Sometimes it is handy to alter your labels a little, perhaps adding units or percent signs without adding them to your data. You can use a function in this case. Here is an example:

ggplot(chic, aes(date, temp)) +
   geom_point(color = "firebrick") +
   labs(x = "Year", y = "Temperature") +
   scale_y_continuous(label = function(x) {return(paste(x, "Degrees Fahrenheit"))})  

Working with Titles

Add a Title

(g <- g + ggtitle("Temperatures in Chicago"))

Alternatively, you can use g + labs("Temperatures in Chicago"). Here you can add several arguments, e.g. additionally a subtitle and a caption:

g + labs(title = "Temperatures in Chicago", 
         subtitle = "Seasonal pattern of daily temperatures from 1997 to 2001", 
         caption = "Data from NMMAPS")

Make Title Bold & Add a Space at the Baseline

(g <- g + theme(plot.title = element_text(size = 15, face = "bold", margin = margin(10, 0, 10, 0))))  ## top, right, bottom, left

The margin argument uses the margin function and you provide the top, right, bottom and left margins (the default unit is points).

Adjust Position of Titles

Allignement is controlled by hjust (which stands for horizontal adjustment):

g + theme(plot.title = element_text(size = 15, face = 4, hjust = 1))

Use a Non-Traditional Font in Your Title

Note that you can also use different fonts. To use fonts which are installed on your machine (and you may be using in your office program) we get help from a package called extrafont. It is not as easy as it seems here, check out this post if you need to use different fonts.

After we loaded the package, you need to import and load the fonts ofinstalled on your device:

library(extrafont)
font_import()
## Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
## Continue? [y/n]
loadfonts(device = "win")

You can have a look on your imported font libary, by typing fonts() or fonttable().

Now, we can use one of those font families:

g + theme(plot.title = element_text(size = 18, family = "Merriweather"))

Change Spacing in Multi-Line Text

You can use the lineheight argument to change the spacing between lines. In this example, I have squished the lines together a bit (lineheight < 1).

g + ggtitle("Temperatures in Chicago\nfrom 1997 to 2001") + 
      theme(plot.title = element_text(size = 16, face = "bold", vjust = 1, lineheight = 0.75))

Working with Legends

We will color code the plot based on season. You can see that by default the legend title is what we specified in the color argument:

(g <- ggplot(chic, aes(date, temp, color = factor(season))) +
         geom_point() +
         labs(x = "Year", y = "Temperature"))

Turn Off the Legend

Always one of the first question is: “How can I get rid of the legend?”.

It is quite easy and always works with legend.position = "none":

g + theme(legend.position = "none")

You can also use guides(fill = F) or use scale_fill_discrete(guide = F) depending on the specific case.

Change Order of Legend Keys

We can archieve this by changing the levels of season:

chic$season <- factor(chic$season, levels = c("Spring", "Summer", "Autumn", "Winter"))
(g <- ggplot(chic, aes(date, temp, color = factor(season))) +
        geom_point() +
        labs(x = "Year", y = "Temperature"))

Turn Off Legend Titles

g + theme(legend.title = element_blank())

Change Style of Legend Titles

g + theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold"))

Change Legend Title

The legend details can be changed via scale_color_discrete or scale_color_continuous depending on the type of variable displaying.

g + theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold")) +
    scale_color_discrete(name = "Seasons\nindicated\nby colors:")

Note that you can use the short command which is scale_color_discrete("Seasons\nindicated\nby colors:"). In most cases the string is interpreted as name (but sometimes you need to include it e.g. when using custom themes).

Change Legend Labels

We are going to replace the seasons by the months which they are covering:

g + theme(legend.title = element_text(colour = "chocolate", size = 14, face = 2)) +
    scale_color_discrete("Seasons:", labels=c("Mar - May", "Jun - Aug", "Sep - Nov", "Dec - Feb"))

Change Background Boxes in the Legend

g + theme(legend.key = element_rect(fill = "darkgoldenrod1"),
          legend.title = element_text(colour = "chocolate", size = 14, face = 2)) +
    scale_color_discrete("Seasons:")

If you want to get rid of them entirely use fill = NA.

Change Size of Legend Symbols

Points in the legend get a little lost, especially without the boxes. To override the default try:

g + theme(legend.key = element_rect(fill = NA),
         legend.title = element_text(colour = "chocolate", size = 14, face = 2)) +
    scale_color_discrete("Seasons:") +
    guides(color = guide_legend(override.aes = list(size = 6)))

Leave a Layer Off the Legend

Let’s say you have a point layer and you add a rug plot of the same data. By default, both the points and the “line” end up in the legend like this:

g + geom_rug() +
    theme(legend.title = element_text(colour = "chocolate", size = 14, face = 2)) +
    scale_color_discrete("Seasons:")

You can use show.legend = F to turn off a layer in the legend:

g + geom_rug(show.legend = F) +
    theme(legend.title = element_text(colour = "chocolate", size = 14, face = 2)) +
    scale_color_discrete("Seasons:")

Manually Adding Legend Items

ggplot2 will not add a legend automatically unless you map aethetics (color, size etc) to a variable. There are times, though, that I want to have a legend so that it is clear what you are plotting.

Here is the default:

ggplot(chic, aes(x = date, y = o3)) +
   geom_line(color = "grey") +
   geom_point(color = "red") +
   labs(x = "Year", y = "Ozone")

We can force a legend by mapping to a variable. We are mapping the lines and the points using aes and we are mapping not to a variable in our dataset but to a single string (so that we get just one color for each).

ggplot(chic, aes(x = date, y = o3)) +
   geom_line(aes(color = "line")) +
   geom_point(aes(color = "points")) +
   labs(x = "Year", y = "Ozone") +
   scale_color_discrete("Type:")

We are getting close but this is not what we want. We want grey and red! To change the color, we use scale_colour_manual(). Additionally, we override the legend aesthetics using the guide() function.

Voila! Now, we have a plot with frey lines and red pints as well as a single grey line and a single red point as legend symbols:

ggplot(chic, aes(x = date, y = o3)) + 
   geom_line(aes(color = "line")) +  
   geom_point(aes(color = "points")) +
   labs(x = "Year", y = "Ozone") +
   scale_color_manual("", values = c("points" = "red", "line" = "grey"), guide = "legend") +
   guides(colour = guide_legend(override.aes = list(linetype = c(1, 0), shape = c(NA, 16))))

Working with Backgrounds & Grid Lines

There are ways to change the entire look of your plot with one function (see below) but if you want to simply change the colors of some elements, you can also do that.

Change the Panel Color

(g <- ggplot(chic, aes(date, temp)) +
         geom_point(color = "firebrick") +
         labs(x = "Year", y = "Temperature") +
         theme(panel.background = element_rect(fill = "white")))

Change Grid Lines

There are two types of grid lines: major grid lines indicating the ticks and minor grid lines between the major ones.

(g <- g + theme(panel.grid.major = element_line(colour = "grey10", size = 0.5),
               panel.grid.minor = element_line(colour = "grey70", size = 0.25)))

Furthermore, you can also define the breaks between both, major and minor grid lines:

g + scale_y_continuous(breaks = seq(0, 100, 10), minor_breaks = seq(0, 100, 2.5))

Change the Plot Background Color

g + theme(plot.background = element_rect(fill = "grey60"))

Working with Margins

Sometimes it is useful to add a little space to the plot margin. Similar to the previous examples we can use an argument to the theme() function. In this case the argument is plot.margin. As In the previous example we already illustrated the default margin by changing the background color using plot.background.

Now let us add extra space to both the left and right. The argument, plot.margin, can handle a variety of different units (cm, inches, etc.) but it requires the use of the function unit from the package grid to specify the units. Here I am using a 5 cm margin on the right and left.

g + theme(plot.background = element_rect(fill = "grey60"),
          plot.margin = unit(c(1, 5, 1, 5), "cm"))  ## top, right, bottom, left

Working with Multi-Panel Plots

The ggplot2 package has two nice functions for creating multi-panel plots. They are related but a little different facet_wrap creates essentially a ribbon of plots based on a single variable while facet_grid can take two variables.

Create a Single Row of Plots Based on One Variable

g <- ggplot(chic, aes(date, temp)) +
       geom_point(color = "chartreuse4") +
       labs(x = "Year", y = "Temperature")
g + facet_wrap(~year, nrow = 1)

Create a Matrix of Plots Based on One Variable

g + facet_wrap(~year, nrow = 2)

Allow Scales to Roam Free

The default for multi-panel plots in ggplot2 is to use equivalent scales in each panel. But sometimes you want to allow a panels own data to determine the scale. This is not often a good idea since it may give your user the wrong impression about the data but to do this you can set scales = "free" like this:

g + facet_wrap(~year, nrow = 2, scales = "free")

Note that both, x and y axes differ in their range!

Create a Grid of Plots Based on Two Variables

ggplot(chic, aes(date, temp)) +
   geom_point(color = "orangered") +
   labs(x = "Year", y = "Temperature") +
   facet_grid(year~season)

To change from row to column arrangement you can change facet_grid(year~season) to facet_grid(season~year).

Put Two (Different) Plots Side by Side

Doing this is not nearly as straightforward as traditional (base) graphics. Here are two approaches:

p1 <- ggplot(chic, aes(date, temp, color = factor(season))) + 
         geom_point() + labs(x = "Year", y = "Temperature") + guides(colour = F) 
p2 <- ggplot(chic, aes(x = date, y = o3)) + 
         geom_line(color = "grey") + geom_point(color = "red") + 
         labs(x = "Year", y = "Ozone")

library(grid)
pushViewport(viewport(layout = grid.layout(1, 2)))
print(p1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))

Alternatively, this way might be a little bit easier (but now including legends — but that’s independent from the method):

p1 <- ggplot(chic, aes(date, temp, color = factor(season))) + 
         geom_point() + labs(x = "Year", y = "Temperature") + 
         theme(legend.title = element_blank())
p2 <- ggplot(chic, aes(x = date, y = o3)) + 
         geom_line(aes(color = "line")) + geom_point(aes(color = "points")) + 
         labs(x = "Year", y = "Ozone") +
         scale_color_manual("", values = c("points" = "red", "line" = "grey"), guide = "legend") +
         guides(colour = guide_legend(override.aes = list(linetype = c(1, 0), shape = c(NA, 16))))

library(gridExtra)
grid.arrange(p1, p2, ncol = 2)

Working with Colors

For simple applications working with colors is straightforward in ggplot2 but when you have more advanced needs it can be a challenge. For a more advanced treatment of the topic you should probably get your hands on Hadley’s book which has nice coverage. There are a few other good sources including the R Cookbook and the ggplot2 online docs. Tian Zheng at Columbia has created a useful PDF of R colors.

In order to use color with your data, most importantly you need to know if you are dealing with a categorical or continuous variable.

Categorical Variables: Manually Select Colors

(g <- ggplot(chic, aes(date, temp, color = factor(season))) +
      geom_point() + 
      labs(x = "Year", y = "Temperature") +
      theme(legend.title = element_blank()) +
      scale_color_manual(values = c("dodgerblue4", "darkolivegreen4", "darkorchid3", "goldenrod1")))

Categorical Variables: Use Built-In Palettes

g + scale_color_brewer(palette = "Set1")

You can ignore the message in the console, replacing the existing scale is what we want.

Categorical Variables: Use Tableau colors based on ggthemes

library(ggthemes)

g + scale_color_tableau()

Continuous Variables: Manually Select Colors

In our example we will change the color variable to ozone, a continuous variable that is strongly related to temperature (higher temperature = higher ozone). The function scale_color_gradient() is a sequential gradient while scale_color_gradient2() is diverging.

Here is the default ggplot2 continuous color scheme (sequential color scheme):

(g <- ggplot(chic, aes(date, temp, color = o3)) + 
         geom_point() + 
         labs(x = "Year", y = "Temperature") +
         scale_color_continuous("Ozone:"))

This code produces the same plot:

ggplot(chic, aes(date, temp, color = o3)) +  
   geom_point() + 
   labs(x = "Year", y = "Ozone") +
   scale_color_gradient()

Continuous Variables: Manually Set a Sequential Color Scheme

g + scale_color_gradient(low = "darkkhaki", high = "darkgreen", "Ozone:")

Temperature data is normally distributed so how about a diverging color scheme (rather than sequential). For diverging color you can use the scale_color_gradient2 function:

mid <- max(chic$o3) / 2  ## or mid <- mean(chic$o3)

g + theme(panel.background = element_rect(fill = "grey60")) + 
    scale_color_gradient2(midpoint = mid, low = "blue4", mid = "white", high = "red4", "Ozone:")

Continuous Variables: Use the Beautiful Viridis Color Palette

The Viridis color palettes do not only make your plots look pretty and good to perceive but also easier to read by those with colorblindness and print well in grey scale:

Figure 1: Desaturated Color Palettes for Printing

**Figure 1:** Desaturated Color Palettes for Printing

Figure 2: Color Palettes Appearing under Green-Blindness (Deuteranopia)

**Figure 2:** Color Palettes Appearing under Green-Blindness (Deuteranopia)

You can test how your plots might appear under various form of colorblindness using dichromate) package.

The following multi-panel plot illustrates two out of the four viridis palettes:

library(viridis)
p1 <- g + scale_color_viridis("Ozone:") + ggtitle("Viridis 'default'")
p2 <- g + scale_color_viridis(option = "inferno", "Ozone:") + ggtitle("Viridis 'inferno'")
library(gridExtra)
grid.arrange(p1, p2, ncol = 2)

It is also possible to use the viridis color palettes for discrete variables:

ggplot(chic, aes(date, temp, color = factor(season))) +
   geom_point() + 
   labs(x = "Year", y = "Temperature") +
   theme(legend.title = element_blank(), 
         panel.background = element_rect(fill = "grey50"), 
         legend.key = element_rect(fill = "grey50")) +
   scale_color_viridis(discrete = T)

Working with Lines

Add Horizonal or Vertical Lines to a Plot

You might want to highlight a given range or threshold, which can be done plotting a line at these defined coordinates using geom_hline() (for “horizontal lines”) or geom_vline() (for “vertical lines”):

g + geom_hline(yintercept = c(20, 73))

ggplot(chic, aes(temp, o3)) +
   geom_point() +
   labs(x = "Temperature", y = "Ozone") + 
   geom_vline(xintercept = quantile(chic$temp)[4], linetype = 2, color = "firebrick", size = 2) +
   geom_hline(yintercept = quantile(chic$o3)[4], linetype = 2, color = "firebrick", size = 2)

If you want to add a line with a slope not being 0 or 1, respectively, you need to use geom_abline(). This is for example the case if you add a regresssion line:

reg <- lm(o3 ~ temp, data = chic)
ggplot(chic, aes(temp, o3)) +
   geom_point() +
   labs(caption = paste0("y = ", round(coefficients(reg)[2], 2), " * x + ", round(coefficients(reg)[1], 2)), 
        x = "Temperature", y = "Ozone") + 
   geom_abline(intercept = coefficients(reg)[1], slope = coefficients(reg)[2], color = "darkorange1", size = 1.5)

Later, we will learn how to add a linear fit with one command using stat_smooth(method = "lm"). However, there might be other reasons to add a line whichwith a given slope.

Working with Text

Add Labels to Your Data

Sometimes, we want to label our data points. To avoid overlaying and -crowding by text labels, we use a 1% sample of the original data, equally representing the four seasons.

set.seed(2017)
library(tidyverse)
sample <- chic %>% group_by(season) %>% sample_frac(0.01)
# base R code: sample <- sample_frac(group_by(chic, season), 0.01)

ggplot(sample, aes(date, temp, label = season)) +
   geom_point() + 
   geom_text(aes(colour = factor(temp)), hjust = 0.5, vjust = -0.5) +
   labs(x = "Year", y = "Temperature") +
   xlim(as.Date(c('1997-01-01', '2000-12-31'))) + ylim(c(0, 90)) +
   theme(legend.position = "none")

Okay, avoiding overlays of labels did not work out. But don’t worry, we are going to fix it in a minute!

You can also use geom_label for boxes:

ggplot(sample, aes(date, temp, label = season)) +
   geom_point() + 
   geom_label(aes(fill = factor(temp)), colour = "white", fontface = "bold", hjust = 0.5, vjust = -0.25) +
   labs(x = "Year", y = "Temperature") +
   xlim(as.Date(c('1997-01-01', '2000-12-31'))) + ylim(c(0, 90)) +
   theme(legend.position = "none")

A cool thing is the ggrepel package which provides geoms for ggplot2 to repel overlapping text as in our examples above. Here, we also show both, the original data and our sample data which gets labeled:

library(ggrepel)
ggplot(chic, aes(date, temp, label = season)) +
   geom_point() +
   geom_point(data = sample, aes(color = factor(temp)), size = 2.5) +
   geom_label_repel(data = sample, aes(fill = factor(temp)), colour = "white", fontface = "bold") +
   labs(x = "Year", y = "Temperature") +
   theme(legend.position = "none")

This also works for the pure text labels by using geom_text_repel. Have a look at all the usage examples.

Add Text Annotation in the Top-Right, Top-Left etc.

With ggplot2 you can set annotation coordinates to Inf but this is only moderately useful. Here is an example (based on code from this Google group) using the library grid that allows you to specify the location based on scaled coordinates where 0 is low and 1 is high.

The grobTree function from the grid package creates a grid graphical object and textGrob creates the text graphical object. The annotation_custom() function comes from ggplot2 and is designed to use a grob as input.

library(grid)
my_grob = grobTree(textGrob("This text stays in place!", x = 0.1, y = 0.95, hjust = 0, gp = gpar(col = "blue", fontsize = 15, fontface = "italic")))

ggplot(chic, aes(temp, o3)) +
   geom_point(color = "firebrick") + 
   labs(x = "Temperature", y ="Ozone") +
   annotation_custom(my_grob)

The value of this is particularly evident when you have multiple plots with different scales. In the plot below you see that the axis scales vary yet the same code as above can be used to put the annotation is the same place on each facet.

ggplot(chic, aes(temp, o3)) +
   geom_point(color = "firebrick") + 
   labs(x = "Temperature", y ="Ozone") +
   facet_wrap(~season, scales = "free") +
   annotation_custom(my_grob)

Working with Coordinates

Flip a Plot

It is incredibly easy to flip your plot on its side. Here I have added the coord_flip() which is all you need to flip the plot (by the way, we are trying a new plot type by using geom_boxpot()).

ggplot(chic, aes(x = season, y = o3)) +
   geom_boxplot(fill = "indianred") + 
   labs(x = "Season", y = "Ozone") +
   coord_flip()

Working with Plot Types

Alternatives to The Box Plot

Box plots are great, but they can be so incredibly boring. There are alternatives, but first we are plotting a common box plot:

g <- ggplot(chic, aes(x = season, y = o3)) + 
         labs(x = "Season", y = "Ozone")
g + geom_boxplot(fill = "indianred")

Effective? Yes.

Interesting? No.

1. Alternative: Plot of Points

g + geom_point(color = "firebrick")

Not only boring but uninformative. One could add transparency to deal with overplotting, but this is not good either.

2. Alternative: Jitter the Points

Try adding a little jitter to the data. I like this for in-house visualization but be careful using jittering because you are purposely adding noise to your data and this can result in misinterpretation of your data.

g + geom_jitter(alpha = 0.5, aes(color = season), position = position_jitter(width = 0.6)) +
         theme(legend.title = element_blank())

3. Alternative: Violin Plots

Violin plots, similar to box plots except you are using a kernel density to show where you have the most data, are a useful visualization.

g + geom_violin(color = "sienna", fill = "red", alpha = 0.4)

4. Alternative: Combining Violin Plots with Jitter

g + geom_violin(color = "gray", alpha = 0.5) +
    geom_jitter(aes(color = season), position = position_jitter(width = 0.3), alpha = 0.3) +
    theme(legend.title = element_blank()) +
    coord_flip()

Create a Rug Representation to a Plot

ggplot(chic, aes(date, temp, color = factor(season))) +
   geom_point() +
   geom_rug() +
   labs(x = "Year", y = "Temperature") +
   theme(legend.position = "none")

Create a Tiled Correlation Plot

First step is to create the correlation matrix. We are using Pearson because all the variables are fairly normally distributed (but you may consider Spearman if your variables follow a different pattern). Note that since a correlation matrix has redundant information we are setting half of it to NA.

corm <- round(cor(chic[ ,sort(c("death", "temp", "dewpoint", "pm10", "o3"))], 
                  method = "pearson", use = "pairwise.complete.obs"), 2)
corm[lower.tri(corm)] <- NA
corm
##          death dewpoint    o3 pm10  temp
## death        1    -0.47 -0.24 0.00 -0.49
## dewpoint    NA     1.00  0.45 0.33  0.96
## o3          NA       NA  1.00 0.21  0.53
## pm10        NA       NA    NA 1.00  0.37
## temp        NA       NA    NA   NA  1.00

Now we put the resulting matrix in long format using the melt function from the reshape2 package and drop the records with NA values:

library(reshape2)
corm <- melt(corm)
corm$Var1 <- as.character(corm$Var1)
corm$Var2 <- as.character(corm$Var2)
corm <- na.omit(corm)
head(corm, 10)
##        Var1     Var2 value
## 1     death    death  1.00
## 6     death dewpoint -0.47
## 7  dewpoint dewpoint  1.00
## 11    death       o3 -0.24
## 12 dewpoint       o3  0.45
## 13       o3       o3  1.00
## 16    death     pm10  0.00
## 17 dewpoint     pm10  0.33
## 18       o3     pm10  0.21
## 19     pm10     pm10  1.00

For the plot we will use geom_tile but if you have a lot of data you might consider geom_raster which can be much faster.

ggplot(corm, aes(Var2, Var1)) +
   geom_tile(data = corm, aes(fill = value), color = "white") +
   labs(x = "Variable 2", y = "Variable 1") +
   scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0, 
                        limit = c(-1, 1), name = "Correlation\n(Pearson)") +
   theme(axis.text.x = element_text(angle = 45, size = 11, vjust = 1, hjust = 1)) +
   coord_equal()

Create a Contour Plot

Contour plots are s nice way to display three-dimensional data by indicating die thresholds of values. Here, we are going to plot the dew point (i.e. the temperature at which airborne water vapor will condense to form liquid dew) related to temperature and ozone levels:

## interpolate data
library(akima)
fld <- with(chic, interp(x = temp, y = o3, z = dewpoint))

## prepare data in long format
library(reshape2)
df <- melt(fld$z, na.rm = T)
names(df) <- c("x", "y", "Dewpoint")
df$Temperature <- fld$x[df$x]
df$Ozone <- fld$y[df$y]

g <- ggplot(data = df, aes(x = Temperature, y = Ozone, z = Dewpoint)) +
         theme(panel.background = element_rect(fill = "white"),
               panel.border = element_rect(colour = "black", fill = NA),
               legend.title = element_text(size = 15),
               axis.text = element_text(size = 12),
               axis.title.x = element_text(size = 15, vjust = -0.5),
               axis.title.y = element_text(size = 15, vjust = 0.2),
               legend.text = element_text(size = 12))
         
g + stat_contour(aes(colour = ..level.., fill = Dewpoint))

Surprise! As it is defined, the drew point is in most cases equal to the measured temperature.

The lines are indicating different levels of drew points, but this is not a pretty plot. Let’s try a tile plot with the viridis color palette:

g + geom_tile(aes(fill = Dewpoint)) +
    scale_fill_viridis(option = "inferno")

How does it look if we combine a contour plot and a tile plot to fill the area under the contour lines?

g + geom_tile(aes(fill = Dewpoint)) + 
    stat_contour(colour = "white", size = 0.7, bins = 5) + 
    scale_fill_viridis()

Working with Ribbons (AUC, CI, etc.)

This is not the perfect dataset for this, but using ribbon can be useful. In this example we will create a 30-day running average using the filter() function so that our ribbon is not too noisy.

chic$o3run <- as.numeric(stats::filter(chic$o3, rep(1/30, 30), sides = 2))

ggplot(chic, aes(date, o3run)) +
   geom_line(color = "chocolate", lwd = 1) +
   labs(x = "Year", y = "Temperature")

How does it look if we fill in the area below the curve using the geom_ribbon() function?

ggplot(chic, aes(date, o3run)) +
   geom_ribbon(aes(ymin = 0, ymax = o3run), fill = "orange", color = "orange", alpha = 0.4) +
   geom_line(color = "chocolate", lwd = 1) +
   labs(x = "Year", y = "Temperature")

Nice to indicate the area under the curve (AUC) but this is not the conventional way to use geom_ribbon(). Instead, we draw a ribbon that gives us one standard deviation above and below our data:

chic$mino3 <- chic$o3run - sd(chic$o3run, na.rm = T)
chic$maxo3 <- chic$o3run + sd(chic$o3run, na.rm = T)

ggplot(chic, aes(date, o3run)) +
   geom_ribbon(aes(ymin = mino3, ymax = maxo3), fill = "lightskyblue", color = "lightskyblue") +
   geom_line(color = "royalblue4", lwd = 0.7) +
   labs(x = "Year", y = "Temperature")

Working with Smoothings

It is amazingly easy to add a smoothing to your data using ggplot2. You can simply use stat_smooth() which will add a LOESS (locally weighted scatterplot smoothing) if you have fewer than 1000 points or a GAM (generalized additive model) otherwise. Since we have more than 1000 points, the smoothing is based on a GAM.

Default: Adding a LOESS or GAM Smoothing

Here it is at its simplest – not even a formula required. For datasets with n < 1000 the default is automatically set to loess, for datasets with 1000 or more observations to gam.

ggplot(chic, aes(date, temp)) + 
   geom_point(color="firebrick")+
   labs(x = "Year", y = "Temperature") +
   stat_smooth()
## `geom_smooth()` using method = 'gam'

Specifying the Formula for Smoothing

ggplot2 allows you to specify the model you want it to use. Lets say you want to increase the GAM dimension (add some additional wiggles to the smooth):

ggplot(chic, aes(date, temp)) + 
   geom_point(color="grey60")+
   labs(x = "Year", y = "Temperature") +
   stat_smooth(method = "gam", formula = y~s(x, k = 1000), 
               se = F, size = 1.3, aes(col = "1000")) +
   stat_smooth(method = "gam", formula = y~s(x, k = 100), 
               se = F, size = 1, aes(col = "100")) +
   stat_smooth(method = "gam", formula = y~s(x, k = 10), 
               se = F, size = 0.8, aes(col = "10")) +
   scale_colour_manual(name = "k", values=c("darkorange1", "firebrick", "dodgerblue3"))

Adding a Linear Fit

Though the default is a smooth, it is also easy to add a standard linear fit:

ggplot(chic, aes(temp, death)) +
   geom_point(color = "firebrick") +
   labs(x = "Temperature", y = "Deaths") +
   stat_smooth(method = "lm", col = "darkorange1", se = F, size = 1.3)

Working with Themes

Change the Overall Plotting Style

You can change the entire look of the plots by using themes. As an example, Jeffrey Arnold has put together the library ggthemes with several custom themes. For a list you can visit the ggthemes site. Without any coding you can just adapt several styles, some of them well known for their style and aesthetics.

Here is an example copying the plotting style in the The Economist magazine:

library(ggthemes)

ggplot(chic, aes(date, temp, color = factor(season))) +
   geom_point() +
   labs(x = "Year", y = "Temperature") + 
   ggtitle("Ups and Downs of Chicagos Daily Temperatures") +
   theme_economist() + 
   scale_colour_economist(name = "Seasons:") +
   theme(legend.title = element_text(size = 12, face = "bold"))

Another example is the plotting style of Tufte, a minimal ink theme based on Edward Tufte’s book The Visual Display of Quantitative Information. This is the book that popularized Minard’s chart depicting Napoleon’s march on Russia as one of the best statistical drawings ever created. Tuftes plots became famous due to the purism in their style. But see yourself:

set.seed(2017)
chic.red <- chic[sample(nrow(chic), 50), ]

(t <- ggplot(chic.red, aes(temp, o3)) +
   geom_point() +
   labs(x = "Temperature", y = "Ozone") + 
   ggtitle("Temperature and Ozone Levels in Chicago") +
   theme_tufte() +
   stat_smooth(method = "lm", col = "black", size = 0.7, fill = "gray60", alpha = 0.2))

t + geom_rangeframe()

Since Tufte’s style is about minimalism, we first reduced the number of data points shown to (at least) try to follow his rules. (Do not care about that stat_smooth command, I will explain it later. Just added it to make plot more interesting.)

If you like the way of plotting have a look on this blog entry recreating several Tufte plots in R.

Change the Size of All Plot Text Elements

It is incredibly easy to change the size of all the text elements at once.

If you have a closer look at the default theme (see chapter “Create and Use Your Custom Theme” below) you will notice that the sizes of all the elements are relative (rel()) to the base_size. As a result, you can simply change the base_size if you want to increase readability of your plots:

theme_set(theme_gray(base_size = 30))

ggplot(chic, aes(date, temp, color = factor(season))) + 
   geom_point() + 
   labs(x = "Year", y = "Temperature") + 
   guides(colour = F) 

Create and Use Your Custom Theme

If you want to change the theme for an entire session you can use theme_set as in theme_set(theme_bw()). The default is called theme_gray. If you wanted to create your own custom theme, you could extract the code directly from the gray theme and modify. Note that the rel() function change the sizes relative to the base_size.

theme_gray
## function (base_size = 11, base_family = "") 
## {
##     half_line <- base_size/2
##     theme(line = element_line(colour = "black", size = 0.5, linetype = 1, 
##         lineend = "butt"), rect = element_rect(fill = "white", 
##         colour = "black", size = 0.5, linetype = 1), text = element_text(family = base_family, 
##         face = "plain", colour = "black", size = base_size, lineheight = 0.9, 
##         hjust = 0.5, vjust = 0.5, angle = 0, margin = margin(), 
##         debug = FALSE), axis.line = element_blank(), axis.line.x = NULL, 
##         axis.line.y = NULL, axis.text = element_text(size = rel(0.8), 
##             colour = "grey30"), axis.text.x = element_text(margin = margin(t = 0.8 * 
##             half_line/2), vjust = 1), axis.text.x.top = element_text(margin = margin(b = 0.8 * 
##             half_line/2), vjust = 0), axis.text.y = element_text(margin = margin(r = 0.8 * 
##             half_line/2), hjust = 1), axis.text.y.right = element_text(margin = margin(l = 0.8 * 
##             half_line/2), hjust = 0), axis.ticks = element_line(colour = "grey20"), 
##         axis.ticks.length = unit(half_line/2, "pt"), axis.title.x = element_text(margin = margin(t = half_line), 
##             vjust = 1), axis.title.x.top = element_text(margin = margin(b = half_line), 
##             vjust = 0), axis.title.y = element_text(angle = 90, 
##             margin = margin(r = half_line), vjust = 1), axis.title.y.right = element_text(angle = -90, 
##             margin = margin(l = half_line), vjust = 0), legend.background = element_rect(colour = NA), 
##         legend.spacing = unit(0.4, "cm"), legend.spacing.x = NULL, 
##         legend.spacing.y = NULL, legend.margin = margin(0.2, 
##             0.2, 0.2, 0.2, "cm"), legend.key = element_rect(fill = "grey95", 
##             colour = "white"), legend.key.size = unit(1.2, "lines"), 
##         legend.key.height = NULL, legend.key.width = NULL, legend.text = element_text(size = rel(0.8)), 
##         legend.text.align = NULL, legend.title = element_text(hjust = 0), 
##         legend.title.align = NULL, legend.position = "right", 
##         legend.direction = NULL, legend.justification = "center", 
##         legend.box = NULL, legend.box.margin = margin(0, 0, 0, 
##             0, "cm"), legend.box.background = element_blank(), 
##         legend.box.spacing = unit(0.4, "cm"), panel.background = element_rect(fill = "grey92", 
##             colour = NA), panel.border = element_blank(), panel.grid.major = element_line(colour = "white"), 
##         panel.grid.minor = element_line(colour = "white", size = 0.25), 
##         panel.spacing = unit(half_line, "pt"), panel.spacing.x = NULL, 
##         panel.spacing.y = NULL, panel.ontop = FALSE, strip.background = element_rect(fill = "grey85", 
##             colour = NA), strip.text = element_text(colour = "grey10", 
##             size = rel(0.8)), strip.text.x = element_text(margin = margin(t = half_line, 
##             b = half_line)), strip.text.y = element_text(angle = -90, 
##             margin = margin(l = half_line, r = half_line)), strip.placement = "inside", 
##         strip.placement.x = NULL, strip.placement.y = NULL, strip.switch.pad.grid = unit(0.1, 
##             "cm"), strip.switch.pad.wrap = unit(0.1, "cm"), plot.background = element_rect(colour = "white"), 
##         plot.title = element_text(size = rel(1.2), hjust = 0, 
##             vjust = 1, margin = margin(b = half_line * 1.2)), 
##         plot.subtitle = element_text(size = rel(0.9), hjust = 0, 
##             vjust = 1, margin = margin(b = half_line * 0.9)), 
##         plot.caption = element_text(size = rel(0.9), hjust = 1, 
##             vjust = 1, margin = margin(t = half_line * 0.9)), 
##         plot.margin = margin(half_line, half_line, half_line, 
##             half_line), complete = TRUE)
## }
## <environment: namespace:ggplot2>

Now, let us modify the default theme function and have a look at the result:

theme_gray.mod <- function (base_size = 12, base_family = "") {
  half_line <- base_size/2
  theme(line = element_line(colour = "black", size = 0.5, linetype = 1, lineend = "butt"), 
        rect = element_rect(fill = "white", colour = "black", size = 0.5, linetype = 1), 
        text = element_text(family = base_family, face = "plain", colour = "black", 
                            size = base_size, lineheight = 0.9, hjust = 0.5, vjust = 0.5, 
                            angle = 0, margin = margin(), debug = F), 
        axis.line = element_blank(), 
        axis.line.x = NULL, 
        axis.line.y = NULL, 
        axis.text = element_text(size = base_size * 1.1, colour = "black"), 
        axis.text.x = element_text(margin = margin(t = 0.8 * half_line/2), vjust = 1), 
        axis.text.x.top = element_text(margin = margin(b = 0.8 * half_line/2), vjust = 0), 
        axis.text.y = element_text(margin = margin(r = 0.8 * half_line/2), hjust = 1), 
        axis.text.y.right = element_text(margin = margin(l = 0.8 * half_line/2), hjust = 0), 
        axis.ticks = element_line(colour = "black", size = 1), 
        axis.ticks.length = unit(half_line, "pt"), 
        axis.title.x = element_text(margin = margin(t = half_line), vjust = 1, 
                                    size = base_size * 1.3, face = "bold"), 
        axis.title.x.top = element_text(margin = margin(b = half_line), vjust = 0), 
        axis.title.y = element_text(angle = 90, margin = margin(r = half_line), vjust = 1, 
                                    size = base_size * 1.3, face = "bold"), 
        axis.title.y.right = element_text(angle = -90, margin = margin(l = half_line), vjust = 0), 
        legend.background = element_rect(colour = NA), 
        legend.spacing = unit(0.4, "cm"), 
        legend.spacing.x = NULL, 
        legend.spacing.y = NULL, 
        legend.margin = margin(0.2, 0.2, 0.2, 0.2, "cm"), 
        legend.key = element_rect(fill = "grey95", colour = "white"), 
        legend.key.size = unit(1.2, "lines"), 
        legend.key.height = NULL, 
        legend.key.width = NULL, 
        legend.text = element_text(size = rel(0.8)), 
        legend.text.align = NULL, 
        legend.title = element_text(hjust = 0), 
        legend.title.align = NULL, 
        legend.position = "right", 
        legend.direction = NULL, 
        legend.justification = "center", 
        legend.box = NULL, 
        legend.box.margin = margin(0, 0, 0, 0, "cm"), 
        legend.box.background = element_blank(), 
        legend.box.spacing = unit(0.4, "cm"), 
        panel.background = element_rect(fill = "white", colour = NA),
        panel.border = element_rect(colour = "black", fill = NA, size = 2),
        panel.grid.major = element_line(colour = "grey80", size = 1.2),
        panel.grid.minor = element_line(colour = "grey80", size = 0.1),
        panel.spacing = unit(base_size, "pt"), 
        panel.spacing.x = NULL, 
        panel.spacing.y = NULL, 
        panel.ontop = F, 
        strip.background = element_rect(fill = "white", colour = "black"), 
        strip.text = element_text(colour = "black", size = base_size), 
        strip.text.x = element_text(margin = margin(t = half_line, b = half_line)), 
        strip.text.y = element_text(angle = -90, margin = margin(l = half_line, r = half_line)), 
        strip.placement = "inside", 
        strip.placement.x = NULL, 
        strip.placement.y = NULL, 
        strip.switch.pad.grid = unit(0.1, "cm"), 
        strip.switch.pad.wrap = unit(0.1, "cm"), 
        plot.background = element_rect(colour = NA), 
        plot.title = element_text(size = base_size * 1.8, hjust = 0.5, vjust = 1, face = "bold", 
                                  margin = margin(b = half_line * 1.2)), 
        plot.subtitle = element_text(size = base_size * 1.3, hjust = 0.5, vjust = 1, 
                                     margin = margin(b = half_line * 0.9)), 
        plot.caption = element_text(size = rel(0.9), hjust = 1, vjust = 1, 
                                    margin = margin(t = half_line * 0.9)), 
        plot.margin = margin(base_size, base_size, base_size, base_size), complete = T)
}

Have a look on the modified aesthetics with its new look of panel and gridlines as well as axes ticks, texts and titles:

theme_set(theme_gray.mod())

ggplot(chic, aes(date, temp, color = factor(season))) + 
   geom_point() + labs(x = "Year", y = "Temperature") + guides(colour = F)

This way of changing the plot design is highly recommended! It allows you to quickly change any element of your plots by changing it once. You can within a few seconds plot all your results in a congruent style and adapt it to other needs (e.g. a presentation with bigger font size or journall requirements)

You can also set quick changes using theme_update:

theme_gray.mod <- theme_update(panel.background = element_rect(fill = "gray50"))

ggplot(chic, aes(date, temp, color = factor(season))) + 
   geom_point() + labs(x = "Year", y = "Temperature") + guides(colour = F)

For further exercises, we are going to reset the theme to its default:

theme_set(theme_gray())

Working with Interactive Graphs

Shiny

Shiny is a package from RStudio that makes it incredibly easy to build interactive web applications with R. For an introduction and live examples, visit the Shiny homepage.

To look at the potential use, you can check out the Hello Shiny examples. This is the first one:

library(shiny)
runExample("01_hello")

Plot.ly

Plot.ly is a great tool for easily creating online, interactive graphics directly from your ggplot2 plots. The process is surprisingly easy and can be done from within R.

Remarks, Tipps & Tricks

Learing ggplot2 Interactively

There is a nice shiny app showing the potential inputs for different geoms. You can easily change the input and see how the plot changes. Have a look!

Using ggplot2 in Loops and Functions

The grid-based graphics functions in lattice and ggplot2 create a graph object. When you use these functions interactively at the command line, the result is automatically printed, but in source() or inside your own functions you will need an explicit print() statement, i.e. print(g) in most of our examples. See also the Q&A page of R.